-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Delightful-TTS model #2095
Add Delightful-TTS model #2095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must test the model like we do vits.py at the very least and testing individual layers would be even better.
encoding: torch.Tensor, | ||
) -> torch.Tensor: | ||
""" | ||
x --- [N, seq_len, encoder_embedding_dim] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shape def docstrings need also reformatting as the other models to be compatible with our documentation.
@erogol The most recent push of code I know works and is currently training a model. after I confirm it converges Ill clean up the code and write the docs for the model |
@erogol I'm working on fixing a bug in unittest but the code to the model is ready to start the review |
|
||
@dataclass | ||
class DelightfulTTSConfig(BaseTTSConfig): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can consider typing docstrings for the config arguments. I'd help you understand architecture better.
encoder_outputs_res = encoder_outputs | ||
|
||
# Pitch predictor | ||
pitch_pred, avg_pitch_target, pitch_emb = self.pitch_adaptor.get_pitch_embedding_train( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we normalize the ground truth pitch somewhere?
@@ -0,0 +1,89 @@ | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do the gradient pass test as we discuss before.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
Any idea when this will be merged? And will it have a pre-trained model? |
is this PR for Delightful TTS 1 or Delightful TTS 2 (https://arxiv.org/abs/2207.04646) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
@loganhart420 lets wrap up this PR |
Would we have a trained model ?
On Sun, 14 May 2023 at 3:38 PM Eren Gölge ***@***.***> wrote:
@loganhart420 <https://github.com/loganhart420> lets wrap up this PR
—
Reply to this email directly, view it on GitHub
<#2095 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGS5WW24GACUO25ME47RDS3XGCY35ANCNFSM6AAAAAARNXTJQE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
*Mr. Bashir,*
*CEO, AMOXT Pvt. Ltd*
|
doing it now, should I just put the pertained weights in a draft release? |
yea |
1 |
Awesome!
On Sun, 14 May 2023 at 6:15 PM logan hart ***@***.***> wrote:
Would we have a trained model ?
On Sun, 14 May 2023 at 3:38 PM Eren Gölge *@*.*> wrote: @loganhart420
<https://github.com/loganhart420> https://github.com/loganhart420
<https://github.com/loganhart420> lets wrap up this PR — Reply to this
email directly, view it on GitHub <#2095 (comment)
<#2095 (comment)>>, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AGS5WW24GACUO25ME47RDS3XGCY35ANCNFSM6AAAAAARNXTJQE
<https://github.com/notifications/unsubscribe-auth/AGS5WW24GACUO25ME47RDS3XGCY35ANCNFSM6AAAAAARNXTJQE>
. You are receiving this because you are subscribed to this thread.Message
ID: @.*>
-- *Mr. Bashir,* *CEO, AMOXT Pvt. Ltd*
yea
—
Reply to this email directly, view it on GitHub
<#2095 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGS5WW4NFZEHQJM7LLZ4RYTXGDLHDANCNFSM6AAAAAARNXTJQE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
*Mr. Bashir,*
*CEO, AMOXT Pvt. Ltd*
|
* add configs * Update config file * Add model configs * Add model layers * Add layer files * Add layer modules * change config names * Add emotion manager * fIX missing ap bug * Fix missing ap bug * Add base TTS e2e class * Fix wrong variable name in load_tts_samples * Add training script * Remove range predictor and gaussian upsampling * Add helper function * Add vctk recipe * Add conformer docs * Fix linting in conformer.py * Add Docs * remove duplicate import * refactor args * Fix bugs * Removew emotion embedding * remove unused arg * Remove emotion embedding arg * Remove emotion embedding arg * fix style issues * Fix bugs * Fix bugs * Add unittests * make style * fix formatter bug * fix test * Add pyworld compute pitch func * Update requirments.txt * Fix dataset Bug * Chnge layer norm to instance norm * Add missing import * Remove emotions.py * remove ssim loss * Add init layers func to aligner * refactor model layers * remove audio_config arg * Rename loss func * Rename to delightful-tts * Rename loss func * Remove unused modules * refactor imports * replace audio config with audio processor * Add change sample rate option * remove broken resample func * update recipe * fix style, add config docs * fix tests and multispeaker embd dim * remove pyworld * Make style and fix inference * Split tts tests * Fixup * Fixup * Fixup * Add argument names * Set "random" speaker in the model Tortoise/Bark * Use a diff f0_cache path for delightfull tts * Fix delightful speaker handling * Fix lint * Make style --------- Co-authored-by: loganhart420 <[email protected]> Co-authored-by: Eren Gölge <[email protected]>
model implementation from: https://arxiv.org/pdf/2110.12612.pdf